Efficient Subgraph Frequency Estimation with G-Tries

نویسندگان

  • Pedro Manuel Pinto Ribeiro
  • Fernando M. A. Silva
چکیده

Many biological networks contain recurring overrepresented elements, called network motifs. Finding these substructures is a computationally hard task related to graph isomorphism. G-Tries are an efficient data structure, based on multiway trees, capable of efficiently identifying common substructures in a set of subgraphs. They are highly successful in constraining the search space when finding the occurrences of those subgraphs in a larger original graph. This leads to speedups up to 100 times faster than previous methods that aim for exact and complete results. In this paper we present a new efficient sampling algorithm for subgraph frequency estimation based on g-tries. It is able to uniformly traverse a fraction of the search space, providing an accurate unbiased estimation of subgraph frequencies. Our results show that in the same amount of time our algorithm achieves better precision than previous methods, as it is able to sustain higher sampling speeds.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Large Scale Graph Representations for Subgraph Census

A Subgraph Census (determining the frequency of smaller subgraphs in a network) is an important computational task at the heart of several graph mining algorithms. Recently, several efficient algorithms have been described. We focus on the g-tries, a data structure that encapsulates the topology of the smaller subgraphs in order to speed up the overall computation. Its algorithm makes extensive...

متن کامل

Labeling Subgraph Embeddings and Cordiality of Graphs

Let $G$ be a graph with vertex set $V(G)$ and edge set $E(G)$, a vertex labeling $f : V(G)rightarrow mathbb{Z}_2$ induces an edge labeling $ f^{+} : E(G)rightarrow mathbb{Z}_2$ defined by $f^{+}(xy) = f(x) + f(y)$, for each edge $ xyin E(G)$.  For each $i in mathbb{Z}_2$, let $ v_{f}(i)=|{u in V(G) : f(u) = i}|$ and $e_{f^+}(i)=|{xyin E(G) : f^{+}(xy) = i}|$. A vertex labeling $f$ of a graph $G...

متن کامل

On Sampling from Massive Graph Streams

We propose Graph Priority Sampling (GPS), a new paradigm for order-based reservoir sampling from massive graph streams. GPS provides a general way to weight edge sampling according to auxiliary and/or size variables so as to accomplish various estimation goals of graph properties. In the context of subgraph counting, we show how edge sampling weights can be chosen so as to minimize the estimati...

متن کامل

Study on the new graph constructed by a commutative ring

Let R be a commutative ring and G(R) be a graph with vertices as proper andnon-trivial ideals of R. Two distinct vertices I and J are said to be adjacentif and only if I + J = R. In this paper we study a graph constructed froma subgraph G(R)Δ(R) of G(R) which consists of all ideals I of R such thatI Δ J(R), where J(R) denotes the Jacobson radical of R. In this paper westudy about the relation b...

متن کامل

Efficient estimation of graphlet frequency distributions in protein-protein interaction networks

MOTIVATION Algorithmic and modeling advances in the area of protein-protein interaction (PPI) network analysis could contribute to the understanding of biological processes. Local structure of networks can be measured by the frequency distribution of graphlets, small connected non-isomorphic induced subgraphs. This measure of local structure has been used to show that high-confidence PPI networ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010